26 research outputs found

    Data-Driven Decisions and Actions in Today’s Software Development

    Full text link
    Today’s software development is all about data: data about the software product itself, about the process and its different stages, about the customers and markets, about the development, the testing, the integration, the deployment, or the runtime aspects in the cloud. We use static and dynamic data of various kinds and quantities to analyze market feedback, feature impact, code quality, architectural design alternatives, or effects of performance optimizations. Development environments are no longer limited to IDEs in a desktop application or the like but span the Internet using live programming environments such as Cloud9 or large-volume repositories such as BitBucket, GitHub, GitLab, or StackOverflow. Software development has become “live” in the cloud, be it the coding, the testing, or the experimentation with different product options on the Internet. The inherent complexity puts a further burden on developers, since they need to stay alert when constantly switching between tasks in different phases. Research has been analyzing the development process, its data and stakeholders, for decades and is working on various tools that can help developers in their daily tasks to improve the quality of their work and their productivity. In this chapter, we critically reflect on the challenges faced by developers in a typical release cycle, identify inherent problems of the individual phases, and present the current state of the research that can help overcome these issues

    Guidelines for the use and interpretation of assays for monitoring autophagy (3rd edition)

    Get PDF
    In 2008 we published the first set of guidelines for standardizing research in autophagy. Since then, research on this topic has continued to accelerate, and many new scientists have entered the field. Our knowledge base and relevant new technologies have also been expanding. Accordingly, it is important to update these guidelines for monitoring autophagy in different organisms. Various reviews have described the range of assays that have been used for this purpose. Nevertheless, there continues to be confusion regarding acceptable methods to measure autophagy, especially in multicellular eukaryotes. For example, a key point that needs to be emphasized is that there is a difference between measurements that monitor the numbers or volume of autophagic elements (e.g., autophagosomes or autolysosomes) at any stage of the autophagic process versus those that measure fl ux through the autophagy pathway (i.e., the complete process including the amount and rate of cargo sequestered and degraded). In particular, a block in macroautophagy that results in autophagosome accumulation must be differentiated from stimuli that increase autophagic activity, defi ned as increased autophagy induction coupled with increased delivery to, and degradation within, lysosomes (inmost higher eukaryotes and some protists such as Dictyostelium ) or the vacuole (in plants and fungi). In other words, it is especially important that investigators new to the fi eld understand that the appearance of more autophagosomes does not necessarily equate with more autophagy. In fact, in many cases, autophagosomes accumulate because of a block in trafficking to lysosomes without a concomitant change in autophagosome biogenesis, whereas an increase in autolysosomes may reflect a reduction in degradative activity. It is worth emphasizing here that lysosomal digestion is a stage of autophagy and evaluating its competence is a crucial part of the evaluation of autophagic flux, or complete autophagy. Here, we present a set of guidelines for the selection and interpretation of methods for use by investigators who aim to examine macroautophagy and related processes, as well as for reviewers who need to provide realistic and reasonable critiques of papers that are focused on these processes. These guidelines are not meant to be a formulaic set of rules, because the appropriate assays depend in part on the question being asked and the system being used. In addition, we emphasize that no individual assay is guaranteed to be the most appropriate one in every situation, and we strongly recommend the use of multiple assays to monitor autophagy. Along these lines, because of the potential for pleiotropic effects due to blocking autophagy through genetic manipulation it is imperative to delete or knock down more than one autophagy-related gene. In addition, some individual Atg proteins, or groups of proteins, are involved in other cellular pathways so not all Atg proteins can be used as a specific marker for an autophagic process. In these guidelines, we consider these various methods of assessing autophagy and what information can, or cannot, be obtained from them. Finally, by discussing the merits and limits of particular autophagy assays, we hope to encourage technical innovation in the field

    Efficient software evolution analysis: algorithmic and visual tools for investigating fine-grained software histories

    Full text link
    Software analysis and its diachronic sibling, software evolution analysis, rely heavily on data computed by processing existing software. Countless tools have been created for the analysis of source code, binaries and other artifacts. The majority of these tools are written for one particular programming language and their modus operandi typically comprises the analysis of artifacts contained in file system directories representing the current version of a software system. Researchers repurpose these tools for investigating software evolution by analyzing multiple revisions over the lifetime of a project. But even though changes between revisions are usually tiny compared to the size of the affected artifacts, existing software evolution analysis techniques usually rely on redundantly re-analyzing entire files at best, or entire projects at worst, for every additional revision analyzed. These limitations of being tied to a single ecosystem and of treating software as a static, timeless construct, affects how we do software evolution research: it often self-restricts, rather arbitrarily, to the analysis of only a subset of revisions, instead of the full, high-resolution history of a project. Thus, there exist both a need and the potential for representing and analyzing software artifacts more efficiently. In this thesis, we identify several processes in existing software evolution analysis pipelines that suffer from redundancies and inefficiencies. We then develop purpose-agnostic solutions for improving these processes and combine them in a generic, reusable, and extensible analysis library, called LISA. We evaluate our approach extensively by computing (and publishing) code metrics for millions of program revisions, testing its generalizability by supporting multiple types of artifacts, analyses and programming languages, and by applying our tool to conduct concrete code studies. Our findings indicate that analyzing software evolution using traditional tools incurs significant redundancies. We demonstrate that the individual techniques we present are generalizable to multiple programming languages and artifact types and that they can accelerate the processing of evolving software by multiple orders of magnitude. Alongside these core findings, our research has resulted in a state-of-the-art, open-source software analysis library, a large public dataset of historical code metrics, and incremental advancements in understanding the pace of software evolution, developer behavior and the visualization of software evolution

    Reducing Redundancies in Multi-Revision Code Analysis

    Full text link
    Software engineering research often requires analyzing multiple revisions of several software projects, be it to make and test predictions or to observe and identify patterns in how software evolves. However, code analysis tools are almost exclusively designed for the analysis of one specific version of the code, and the time and resources requirements grow linearly with each additional revision to be analyzed. Thus, code studies often observe a relatively small number of revisions and projects. Furthermore, each programming ecosystem provides dedicated tools, hence researchers typically only analyze code of one language, even when researching topics that should generalize to other ecosystems. To alleviate these issues, frameworks and models have been developed to combine analysis tools or automate the analysis of multiple revisions, but little research has gone into actually removing redundancies in multi-revision, multi-language code analysis. We present a novel end-to-end approach that systematically avoids redundancies every step of the way: when reading sources from version control, during parsing, in the internal code representation, and during the actual analysis. We evaluate our open-source implementation, LISA, on the full history of 300 projects, written in 3 different programming languages, computing basic code metrics for over 1.1 million program revisions. When analyzing many revisions, LISA requires less than a second on average to compute basic code metrics for all files in a single revision, even for projects consisting of millions of lines of code

    Rapid Multi-Purpose, Multi-Commit Code Analysis

    No full text

    Of Cyborg Developers and Big Brother Programming AI

    Get PDF
    The main reason modern machine learning mechanisms outperform hand-crafted solutions is the availability of high-quality data in large quantities. We observe that although many day-to-day activities in software engineering (such as bug triaging, reverting regressions, or even implementing code for properly scoped problems) could possibly be automated, we lack the necessary monitoring tools to capture all relevant information. Bug trackers and version control rely mostly on plain text, and specifications are informal or at best semi-structured. After setting the stage via a short excursion to the year 2047, we discuss how a ubiquitous AI, which can learn from every interaction a human developer has with a machine, could take over more and more of the mundane responsabilities in software engineering. We outline how this change will affect software engineering practice as well as education

    Evo-Clocks: Software Evolution at a Glance

    Full text link
    Understanding the evolution of a project is crucial in reverse-engineering, auditing and otherwise understanding existing software. Visualizing how software evolves can be challenging, as it typically abstracts a multi-dimensional graph structure where individual components undergo frequent but localized changes. Existing approaches typically consider either only a small number of revisions or they focus on one particular aspect, such as the evolution of code metrics or architecture. Approaches using a static view with a time axis (such as line charts) are limited in their expressiveness regarding structure, and approaches visualizing structure quickly become cluttered with an increasing number of revisions and components. We propose a novel trade-off between displaying global structure over a large time period with reduced accuracy and visualizing fine-grained changes of individual components with absolute accuracy. We demonstrate how our approach displays changes by blending redundant visual features (such as scales or repeating data points) where they are not expressive. We show how using this approach to explore software evolution can reveal ephemeral information when familiarizing oneself with a new project. We provide a working implementation as an extension to our open-source library for fine-grained evolution analysis, LISA

    Redundancy-free analysis of multi-revision software artifacts

    Full text link
    Researchers often analyze several revisions of a software project to obtain historical data about its evolution. For example, they statically analyze the source code and monitor the evolution of certain metrics over multiple revisions. The time and resource requirements for running these analyses often make it necessary to limit the number of analyzed revisions, e.g., by only selecting major revisions or by using a coarse-grained sampling strategy, which could remove significant details of the evolution. Most existing analysis techniques are not designed for the analysis of multi-revision artifacts and they treat each revision individually. However, the actual difference between two subsequent revisions is typically very small. Thus, tools tailored for the analysis of multiple revisions should only analyze these differences, thereby preventing re-computation and storage of redundant data, improving scalability and enabling the study of a larger number of revisions. In this work, we propose the Lean Language-Independent Software Analyzer (LISA), a generic framework for representing and analyzing multi-revisioned software artifacts. It employs a redundancy-free, multi-revision representation for artifacts and avoids re-computation by only analyzing changed artifact fragments across thousands of revisions. The evaluation of our approach consists of measuring the effect of each individual technique incorporated, an in-depth study of LISA resource requirements and a large-scale analysis over 7 million program revisions of 4,000 software projects written in four languages. We show that the time and space requirements for multi-revision analyses can be reduced by multiple orders of magnitude, when compared to traditional, sequential approaches

    Completing Function Documentation Comments Using Structural Information

    No full text
    Source code comments are a cornerstone of software documentation facilitating feature development and maintenance. Well-defined documentation formats, like Javadoc, make it easy to include structural metadata used to, for example, generate documentation manuals. However, the actual usage of structural elements in source code comments has not been studied yet. We investigate to which extent these structural elements are used in practice and whether the added information can be leveraged to improve tools assisting developers when writing comments. Existing research on comment generation traditionally focuses on automatic generation of summaries. However, recent works have shown promising results when supporting comment authoring through a next-word prediction. In this paper, we present an in-depth analysis of commenting practice in more than 18K open-source projects written in Python and Java showing that many structural elements, particularly parameter and return value descriptions are indeed widely used. We discover that while a majority are rather short at about 6 to 9 words, many are several hundred words in length. We further find that Python comments tend to be significantly longer than Java comments, possibly due to the weakly-typed nature of the former. Following the empirical analysis, we extend an existing language model with support for structural information, substantially improving the Top-1 accuracy of predicted words (Python 9.6%, Java 7.8%).</p

    A search-based training algorithm for cost-aware defect prediction

    Full text link
    Research has yielded approaches to predict future defects in software artifacts based on historical information, thus assisting companies in effectively allocating limited development resources and developers in reviewing each others’ code changes. Developers are unlikely to devote the same effort to inspect each software artifact predicted to contain defects, since the effort varies with the artifacts’ size (cost) and the number of defects it exhibits (effectiveness). We propose to use Genetic Algorithms (GAs) for training prediction models to maximize their cost-effectiveness. We evaluate the approach on two well-known models, Regression Tree and Generalized Linear Model, and predict defects between multiple releases of six open source projects. Our results show that regression models trained by GAs significantly outperform their traditional counterparts, improving the cost-effectiveness by up to 240%. Often the top 10% of predicted lines of code contain up to twice as many defects
    corecore